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ABSTRACT 
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determine  the  conditions  under  which  an  error  detection  scheme 
based  on  strict  redundancy  can  be  used  to  increase  confidence  in  the 
results  of  parallel  computations.  This  study  shows  that  the  issues  of 
speed  and  reliability  of 
be  considered  jointly  at 


parallel  processors  are  interdependent  and  must 
the  design  stage. 


3 


1.  INTRODUCTION 

Recent  technological  developments  have  made  parallel  processing  a 
viable  option  for  achieving  desired  computational  speed.  Given  a  prob¬ 
lem  of  interest  and  an  associated  procedure  to  obtain  its  solution,  a 
cluster  of  (I  computing  elements  may  be  used  to  produce  the  solution  in 
the  required  time,  provided  that  an  appropriate  decomposition  of  the 
procedure  can  be  found.  For  real-time  processing  problems  where  the 
speed  constraints  may  be  quite  severe,  the  number  of  computing  elements 
required  can  be  large.  Besides  the  difficulty  associated  with  the 
decomposition  of  the  solution  procedure,  the  use  of  a  large  number  of 
computing  elements  introduces  a  new  set  of  problems.  In  particular,  the 
probability  that  all  computing  elements  produce  correct  results  becomes 
vanishingly  small  as  the  number  of  computing  elements  increases.  Thus, 
the  computing  cluster  may  produce  a  result  in  the  required  time,  but  the 
probability  that  the  result  is  also  the  solution  of  the  problem  of 
interest  decreases  with  the  number  of  computing  elements  in  the  cluster. 
It  is  clear,  therefore,  that  the  issues  of  speed  and  reliability  are 
interdependent  and  cannot  be  treated  separately. 

The  quantity  that  characterizes  the  reliability  of  a  computing 
cluster  is  P^,  the  probability  that  the  output  of  the  cluster  is 
correct.  In  order  to  analyze  this  and  other  quantities  introduced  later 
on,  we  shall  adopt  the  following  hypothesis: 

Hvnothes IS 

(i)  the  i  ^ut  to  the  cluster  is  correct. 
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(ii)  each  computing  element  in  the  cluster  has  the  same  proba  bility  p  of 
being  non-faulty. 

(iii)  the  computing  elements  fail  independently. 

Therefore,  if  all  the  computing  elements  in  a  cluster  are  fault- 
free.  the  cluster  output  will  be  correct.  The  converse  is  not  neces¬ 
sarily  true,  since  a  computing  element  may  be  faulty  without  affecting 
the  cluster  output.  It  follows  from  Hypothesis  1  that  a  lower  bound  for 
Pq  is  given  by 


Pc  i  =  Pr 


The  quantity  p  depends  on  the  type  of  computing  element  used  and  on 
the  time  interval  over  which  we  are  interested  in  the  output.  This 
means  that  all  the  probabilistic  quantities  discussed  in  this  paper  are 
with  reference  to  the  same  time  interval.  For  example,  if  p  is  the  pro¬ 
bability  that  a  computing  element  remains  non-faulty  for  twenty-four 
hours,  then  P  is  a  lower  bound  on  the  probability  that  the  output  of 

L  a  SI 

a  computing  cluster  is  correct  over  the  same  twenty-four  hour  period. 
Given  p  and  p,  the  value  of  P„  may  not  be  large  enough  for  our 

L  eSI 

purposes,  and  therefore  our  confidence  in  obtaining  the  correct  result 
will  not  be  high  enough.  One  way  to  increase  our. conf idence  in  the 
correctness  of  the  output  is  to  try  to  detect  output  errors,  and  then  to 
accept  the  output  only  when  no  error  has  been  detected.  In  this  case, 
there  are  two  quantities  of  interest:  the  probability  that  the  output  of 
the  original  cluster  is  correct  given  that  we  accept  it,  and  the  proba- 
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bility  that  we  reject  the  output  of  the  original  cluster  given  that  it 
is  correct  (false  alarm). 

While  an  off-line  fault  detection  scheme  might  allow  us  to  ascer¬ 
tain  that  no  hardware  fault  is  present  during  the  application  of  the 
tests,  we  would  have  no  assurance  that  the  cluster  remains  non-faulty 
during  the  actual  computation  of  the  desired  result.  Since  we  are 
interested  in  the  correctness  of  the  results  produced  by  the  computing 
cluster  and  not  in  the  possible  existence  of  hardware  faults,  and  since 
we  cannot  ascertain  the  correctness  of  the  results  before  they  are  pro¬ 
duced.  we  therefore  need  a  concurrent  error  detection  scheme.  One  way 
to  implement  such  a  scheme  is  to  use  strict  redundancy:  replicate  the 
initial  computing  cluster  (CC^)  a-1  times,  send  the  original  input  to 

all  the  clusters  (CC  CC. , . . . ,  CC  ),  and  then  comnare  their  outputs  in 

12a 

order  to  produce  a  boolean  variable  b  that  equals  zero  if  all  cluster 
outputs  are  identical,  and  that  equals  one  otherwise.  If  b  =  0,  we 
accept  the  output  of  if  b  =  1,  we  reject  it.  This  approach  is 

appealing  because,  once  a  computing  cluster  that  meets  the  computational 
speed  requirements  has  been  designed,  the  replication  does  not  involve 
any  additional  design  effort. 

Although  similar  error  detection  schemes  have  been  used  in  various 
contexts  for  some  time  (see  [AVI78]  and  the  references  therein),  no  com¬ 
plete  analysis  of  their  usefulness  under  any  reasonable  set  of  assump¬ 
tions  has  been  carried  out.  Such  an  analysis  is  presented  in  Section  2. 
and  the  consequent  implications  for  reliable  parallel  processing  are 


detailed  in  Section  3. 
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2.  ANALYSIS  OF  THE  ERROR  DETECTION  SCHEME 


In  this  paper,  we  shall  assiuae  that  the  error  detector  is  always 
non-fanlty.  In  other  words,  we  shall  adopt  the  following  hypothesis: 


Hypothesis  2.:  The  detector  produces  b  =  0  if  and  only  if  all  cluster 
outputs  are  identical. 

Let  be  the  probability  that  the  output  of  CC^  is  correct 
given  b  =  0,  and  let  be  the  probability  that  b  =  1  given  that 
the  output  of  CCj^  is  correct.  Lower  and  upper  bounds  for  and 
Pp. (a,3).  respectively,  will  now  be  derived. 


Given  a  clnsters,  each  containing  |1  computing  elements,  let  E^(a.t3) 
be  the  event  that  exactly  J  clusters  are  faulty.  A  cluster  is  faulty  if 
and  only  if  at  least  one  of  its  elements  is  faulty,  and  therefore  the 
probability  that  a  cluster  is  faulty  is  1-p^.  As  a  result. 


Let  B^(a,0)  be  the  event  that  exactly  j  cluster  outputs  are 
incorrect.  It  is  clear  that  if  no  more  than  k  clusters  are  faulty,  then 
at  most  k  cluster  outputs  can  be  incorrect,  and  if  at  least  k  cluster 
outputs  are  incorrect,  then  at  least  k  clusters  are  faulty.  Thus 


k  k 

S  P(B  (o.a))  2  S  P(E.(a.d)). 
j=0  J  j=0  J 


(4) 


S  P(B.(a,;J))  <  r  P(E.(o.i3))  . 


Lemma  X'  Under  Uypotheses  1  and  2. 


CD  JJa  ^  (1  ji)®  CD.m 


Proof:  By  definition 


=  P(CCj^  ontput  correct  I  b=0) 


P(CC^  output  correct  and  b=0) 
?(b=0) 


Using  ilypotiiesis  2>  it  is  clear  that 


P(CCj^  output  correct  and  b=0)  =  PIBqCu.B)) 


P(b*0)  =  P(B-(a,p)  and  b=0)  +  P(B  (a,p)  and  b=0) , 
u  a 


P(BQ(a,|3)  and  b=0)  =  P(Bjj(a.p)), 


and  thus 


P(B^(a.U)) 


P{Bg(o,iJ))  P(B^^(a,i})  and  b=0) 


P(B^(a.a)  and  b=0)  i  P(B^(a,;l)). 


and  (3)  and  (4)  imply 


P(BQ(a.j}))  1  P(E^(a.|d)). 


P(B^(a,p))  i  P(E^(o.U)). 


Therefore 


P(E-(Q.(i>) 


^  P(E-la.iJ))  ^  P(E  (a. 4))  “  lia  ^  ,  iJ.a’ 

0  a  p  +  ( 1-p  ) 


The  behavior  of  the  lower  bound  P„  (a,d)  as  a  function  of  a  is 

lU  •  n 

characterized  in  the  following  leoaa. 


Lemma  2:  Suppose  that  Hypotheses  1  and  2  are  satisfied.  If  p‘  >  O.S, 
then  P^j^  ^(a.i})  converges  to  one  strictly  monotonical ly  as  a  goes  to 
infinity.  If  p^  =»  0.5.  then  “  O.S  for  ail  all.  If  p^  < 

0.5.  then  ,(1)  converges  to  zero  strictly  monotonically  as  a  goes 

to  infinity. 


Proof:  Let  q  ■  p”.  Then  (5)  can  be  written  as 


**rn  ^  3  ^ 


q®  +  (l-q)“  1  +  ((l-q)/q)“ 


If  p^  >  O.S.  then  (l-q)/q  <  1  and  P„  (o.ll)  is  clearly  a  strictly 

cu.m 

increasing  function  of  a  with  a  limiting  value  of  one.  If  p^  >  O.S. 

then  (l-q)/q  =•  1  and  obviously  P„_  (a.B)  ■*  0.5  for  all  o  >  1.  If  p^  v 

cu.m  ~ 

0.5.  then  (l-q)/q  >  1  and  P„  (a. 3)  is  a  strictly  decreasing  function 

V*U  e  Q 

of  a  with  a  limiting  value  of  zero.  □ 


Lemmas  1  and  2  together  with  Equation  (1)  imply  that  (i)  if  p^  > 


0.5,  then  P 


rn  >  Pp  «  *0'  ^  2;  (ii)  if  p*^  =0.5,  then 

**CD  “  **C  m  a  i  2;  and  (iii)  if  <  0.5,  then  ^(o,p) 

<  Pr.  for  all  a  >  2.  It  follows  then  that  the  error  detection  scheme 

t  B 

Q 

can  be  used  to  increase  our  confidence  in  the  output  if  and  only  if  p"  > 
0.5.  We  note  that  in  work  done  in  the  Fifties  on  constructing  reliable 
logic  devices  from  unreliable  components,  von  Neumann  recognized  that 
redundancy  can  degrade  overall  performance  unless  the  components  have 
some  minimum  reliability  (see  (NEU63 ,  pp.  305-306,  322-324,  329-3781). 

The  false  alarm  probability  will  be  considered  next. 

Lemma  2*  Ooder  Hypotheses  1  and  2, 

Pp^(a.U)  i  1  -  =  Pp^  (7) 


Proof:  By  definition 

PpA^O'ti)  =  P(b=l  I  CCj^  output  correct) 

=■  1  -  P(b=0  i  CCj^  output  correct) 

P(b=0  and  CC.  output  correct) 

=  1  _  - - i - 

P(CC  output  correct) 


Now 


P(b=0  and  CCj^  output  correct)  =  P(all  cluster  outputs  correct) 


a 


=  n  (cc. 

j=i  ^ 


output  correct) 
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and  therefore 

a 

=•  1  -  IlPtCC.  output  correct). 
j=2  ■* 

If  CCj  is  non-faulty.  its  output  will  be  correct,  and  thus 

11-  n  P<CCj  non-faulty)  =  1  -  p^^°  □ 

Note  that  Pp^  ^(a.0)  goes  to  one  as  u  -  s  to  infinity.  Thus,  if 

Q 

p^  >  0.5,  we  can  insure  that  P^(a,3)  is  as  to  one  as  we  wish  by 

choosing  a  sufficiently  large  value  for  a,  concomitant  drawback  is 

flist  PpA(a«li)  may  also  be  close  to  one.  In  other  words,  if  we  accepted 
the  output  it  would  almost  certainly  be  correct,  but  we  would  almost 
never  accept  a  correct  output. 


3.  EFFICIENT  COMPUTING  NETWORK  DESIGN 


Suppose  that  that  it  is  possible  to  solve  a  given  problem  in  a 
given  time  using  a  cluster  of  |i  computing  elements  of  a  given  type  T. 
and  assume  that  the  basic  reliability  p  of  a  type  T  computing  element  is 
known.  Furthermore,  suppose  that  we  want  to  design  a  computing  network 
consisting  of  one  or  more  such  clusters  so  that  an  accepted  output  is 
correct  with  probability  at  least  0^^  (0  <'  0^  <  1).  To  satisfy  this 
requirement,  it  is  sufficient,  in  view  of  Lemma  1,  to  ensure  that 


(1-?'^)' 


::  9. 


(3) 


for  some  integer  a  2  1. 

Given  p,  (3  and  9^,  it  may  or  may  not  be  possible  to  satisfy  (8) 
with  some  integer  o  ^  1.  If  (8)  can  be  satisfied,  the  most  efficient 
design  is  obtained  when  a  is  chosen  to  be  the  smallest  integer  a,  that 
satisfies  (8)  .  In  order  to  analyze  the  feasibility  and  efficiency 
issues,  we  partition  the  set  of  all  pairs  O^^.p^)  into  disjoint  subsets 
R^,  R^  and  R^  as  follows: 

Rq  =  ((e^.p'^)!  p^  <  9^  i  0.5  )  U  ((e^.p*^)!  p^  <  0.5  <  }. 


=  {(9j,p^)  I  i  1 , 


R,  -  ({9, .p^)  I  0.5  S  p^  '  9  )  . 
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Lemma  4:  If  is  in  R^,  then  (8)  cannot  be  satisfied  for  any 

integer  a  i  1.  If  (S^.p^)  is  in  then  (8)  can  be  satisfied  and  a«  = 

1.  If  (Oj^.p^)  is  in  R^,  then  (8)  can  be  satisfied  and 


log  ((l-«^)/0j^) 
log  ((l-p^)/p^) 


(9) 


where  Tz^l  is  defined  as  the  smallest  integer  greater  than  or  equal  to  z. 


Proof;  If  '0, ,p^)  is  in  R„ ,  Lemma  2  immediately  implies  that  there  is  no 
I  0 

integer  >i  ^  1  for  which  (8)  can  be  satisfied.  If  (0j^,p^)  is  in  R,  ,  then 
it  is  clear  that  (8)  can  be  satisfied  with  u  =  1.  If  (0j^,p^)  is  in  R, , 
then  (8)  may  be  rewritten  as 


log 

log((l-p^)/p^) 


and  the  result  follows  immediately  since  pi^ 


<  0 


1* 


□ 


Some  ezamples  will  now  be  given. 

Example  1.:  Suppose  that  p  =  0.9,  p  =  10  and  0^^  =  0.8.  In  this  case,  p^ 

•) 

=  0.348,  the  pair  (0^,p'')  is  in  R^,  and  Lemma  4  implies  that  it  is  not 
possible  to  satisfy  the  desired  reliability  constraint. 


Example  2:  Suppose  that  p  =  0.95,  8  =  10  and  0^^  =  0.95.  In  this  case, 
p^  =  0.598,  the  pair  (0j^,p'^)  is  in  R, ,  and  a*  =  8. 

Example  1  clearly  demonstrates  the  interdependence  of  the  speed  and 
reliability  constraints.  Our  only  recourse  here  would  be  to  choose  a 
different  type  of  computing  element:  one  with  a  larger  p  and/or  a  higher 
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intrinsic  sp«ed  allowing  for  a  smaller  (I.  Example  2  shows  that  satis¬ 
faction  of  the  reliability  constraint  may  require  a  great  deal  of  repli¬ 
cation:  in  this  case,  80  computing  elements  instead  of  the  original  10. 


The  design  approach  discussed  above  may  lead  to  an  unacceptably 
large  upper  bound  on  the  false  alarm  probability:  in  the  case  of  Example 
2.  Pp^  =  0.972.  With  this  consideration  in  mind,  suppose  now 

that  we  want  to  design  a  computing  network  so  that  not  only  will  an 
accepted  output  be  correct  with  probability  at  least  0^  (0  <  9^^  1), 

but  also  a  correct  output  will  be  rejected  with  probability  at  most  0., 

(0  0.,  <  1).  To  satisfy  these  two  requirements,  we  need  to  ensure  the 

existence  of  at  least  one  integer  a  l  so  that  both  (8)  and  the  follow¬ 
ing  inequality  are  satisfied  (see  Lemma  3): 


1  -  P*”-"  <  9,  . 


(10) 


Given  p,  J3,  0^  and  0,,  it  may  or  may  not  be  possible  to  satisfy  both  (8) 
and  (10)  with  some  integer  a  ^  1.  In  order  to  analyze  the  feasibility 
issue,  we  partition  the  set  of  all  triples  (0  ,0  ,p^)  into  disjoint  sub- 

X  ^ 

s«ts  Sq,  and  S.,  as  follows: 

Sq  =  {(0j^.0,  ,p*^)  I  p^  ^  ®1  -  [O.S.l-O^J  }  , 

=  ((0^,0,, p^)  I  p*^  2  )  . 


S,  =  {(0j,0,,p^)l  0j^  >  max  [0 .5 , 1-0, ,  p'^1  )  . 


is  in  then  (8)  and  (10)  cannot  be  satisfied 
simultaneoasly  for  any  integer  o  ^  i.  If  (0^,9,. p*^)  is  in  Sj^.  then  (8) 
and  (10)  can  be  satisfied  with  a  =  1. 


Proof:  If  <  0^,  (8)  cannot  be  satisfied  with  o  =  1.  If  in  addition, 

1  0.5,  then  Lenuna  2  implies  that  (8)  cannot  be  satisfied  for  any 
integer  0^2.  If  p^  '  0^^  Ji  l-0„ ,  then  (10)  cannot  be  satisfied  for  any 
a  2  since 

1  -  >  1  _  0  2  0,  . 

If  (0, ,0,,p^)  is  in  S  .  then  clearly  (8)  can  be  satisfied  with  a  =  1, 

1  Z  1 

and  (10)  is  always  satisfied  with  a  =  1.  □ 


Now  define 


g  (d)  =  - i - - 

1  +  ((l-«j)/0j)^^“ 


,  3,  , 


g2(a)  =  (1-02)^^^“’^^  .  o  =  2,  3 . 


log  (1-0,) 
**0  “  ^  ^  log  0, 


q*  =  min  (  max  [g, (a) ,g_(a) ]  I  o  =  2,  3 .  a*  )  , 

x  Z  0 

where[^:dis  defined  as  the  largest  integer  less  than  or  equal  to  x. 


Lemma  6:  Suppose  that  the  triple  (9^,0.,,p'’)  is  in  S,.  If  <  q,,  then 


(8)  and  (10)  cannot  be  satisfied  for  any  integer  oil.  If  p'  i  q,. 
then  there  is  at  least  one  integer  0^2  for  which  (8)  and  (10)  are 
satisfied.  Furthermore,  >  0.5. 


Proof:  First  note  that  since  (9,,9,,p*^)  is  in  S  . 


log  (1-9,) 

2  <  1  -  — ; - < 

log  9, 


and  thus  2  /  o^  <  <»  and  q,  is  well  defined.  Furthermore.  9^  >  0.5 
implies  that  9^  >  g^(2),  and  9^^  >  1“92  implies  that  9^  >  g^{2)  .  Thus  9^ 

>  q*. 

Since  p^  <  0^,  it  is  clear  that  (8)  cannot  be  satisfied  with  a  =  1.  For 
every  o  >  o^.  we  have 


o  >  1  + 


log  (1-9.,) 
log  9j^ 


which  can  be  rewritten  as 

l_^(a-l)  ^ 

X  aw 

Therefore 


1  -  pP^“  >  0^ 


and  (10)  cannot  be  satisfied.  Now  let  p^  <  q,.  For  each  a  in  the 

interval  [2,0^],  either  p^  <  gj^(o),  in  which  case  (8)  cannot  be  satis- 
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fied,  or  else  p'^  <  g.,(a),  in  which  case  (10)  cannot  be  satisfied.  Thus. 


■  e  !) 

If  P  <  q*.  (8)  and  (10)  cannot  be  satisfied  for  any  integer  all. 


It  is  clear  from  the  definition  of  q*  that  there  is  at  least  one  integer 
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a  in  the  inCervtl  [2,0^]  for  which 


q*  =  max  [gj^(a)  .g^Ca)  ]  . 

Thus,  if  2  q*,  then  2  2  82^“^'  implies  that  (8) 

and  (10)  are  satisfied  with  a  =  a. 

If  q*  X  0.5.  then  (8)  cannot  be  satisfied  for  p^  =  q«.  This  contradicts 
what  has  just  been  proved,  and  therefore  we  can  conclude  that  q«  >  0.5. 


Lemmas  5  and  6  show  that  the  conditions  under  which  replication  and 
error  detection  are  useful  for  meeting  constraints  (8)  and  (10)  are 
quite  restrictive:  we  need  to  have  0^  >  0.5,  9^^  >  1~0,  and  p*^  in  the 
interval  [q*,0j^)  . 

If  (8)  and  (10)  can  be  satisfied,  the  most  efficient  design  is 
obtained  when  a  is  chosen  to  be  the  smallest  integer  that  satisfies 
(8)  and  (10).  The  next  leaima  characterizes  a«,. 

Lemma  1:  If  (0j,9,,p^)  is  in  then  =  1.  If  (Oj^.O^.p^)  is  in 


and  p**  2  qe*  then 


log  ((l-0^)/9^) 
log  ((l-p^)/p^) 


Proof:  The  first  part  of  the  lemma  follows  immediately  from  Lemma  5. 

Now  let  (Oj^,0,  ,p^>  be  in  S,  with  p^  2  q**  Rewrite  (8)  and  (10),  respec¬ 


log  ((1-9, )/e,) 


log  ((l-p^)/p^) 


tively,  as 


a  i  1  + 


log  (!-«,) 
log 

Lemma  6  implies  that  there  is  at  least  one  integer  a  satisfying  both 
ineqnalities ,  and  the  result  follows.  O 

We  will  now  present  some  examples  that  illustrate  the  preceding 
lemmas . 

Example  3_:  Suppose  that  p  =  0.95,  jl  =  10,  0,  =  0.8  and  0.  =  0.1.  In 

I  ^ 

i\  a 

this  case,  p*^  =  0.598,  the  triple  (9  ,0,.p’^)  is  in  S  ,  and  Lemma  5 

X  ^  u 

implies  that  it  is  impossible  to  satisfy  the  reliability  constraints 

Example  4:  Suppose  that  p  =0.98,  3  =  10,  9^  =0.9  and  9^  =  0.2.  In 
this  case,  p^  =  0.817,  the  triple  is  in  S^,  q*  =  0.8  and 


Lemma  7  implies  that  =  2. 
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4.  CONCLUSION 

The  approach  used  in  this  paper  has  been  based  on  three  principles: 
(i)  one  shonld  distinguish  between  hardware  faults  in  a  computing  net¬ 
work  and  incorrect  results  produced  by  the  network;  (ii)  one  should 
assume  as  little  as  possible  abont  the  fault  mechanism  since  in  general 
one  knows  very  little  abont  it;  (iii)  one  shonld  use  only  those  quanti¬ 
ties  that  have  some  chance  of  being  experimentally  measured.  These  con¬ 
siderations  rule  out.  in  particular,  the  use  of  a  failure  probability 
distribution  [BAR6S].  They  also  lead  us  to  a  worst  case  analysis. 

Clearly,  if  one  does  not  adhere  to  these  principles  and  is  willing 
to  make  stronger,  more  optimistic,  assumptions,  then  error  detection 
based  on  strict  redundancy  will  look  more  powerful.  For  example,  in 
view  of  Equation  (6).  it  is  clear  that  if,  in  addition  to  Hypotheses  1 
and  2,  we  assume  that  when  all  cluster  ontpnts  are  incorrect  they  are 
not  all  identical,  then  P(B^(a,0)  and  b=0)  =  0,  and  thus  Pj,jj(o,3)  =  1. 

In  this  case  Pp^  is  minimized  when  we  use  the  smallest  possible 

number  of  clusters,  namely  a  =  2. 

Alternatively,  if  in  addition  to  Hypotheses  1  and  2,  we  assume  that 
the  number  of  faulty  computing  elements  is  at  most  then  the  choice  of 
any  a  ^  ^>1  ensures  that  at  least  one  computing  cluster  always  produces 
the  correct  output.  In  this  case.  P(B^(a,|J))  =  0,  which  implies 
P(B^(a,8)  and  b’=0)  =>  0,  and  we  again  obtain  ^CD^“  .il)  =  1.  The  upper 
bound  on  the  false  alarm  probability  is  then  minimized  by  choosing  a  = 
?+l. 
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